Efficiency improvement of exp(::StridedMatrix) with UniformScaling and mul! #40668

jarlebring · 2021-04-30T09:04:22Z

Avoids the use of Inn=Matrix{T}(I,n,n). This reduces the number of
a) matrix-matrix products for nA<=2.1: commit d94b513
b) zero additions for nA>2.1: commit e6860a4

This PR also has some memory allocation improvements by using mul!.

Two CPU-time illustrations
a) nA<2.1

Random.seed!(0);
A=randn(100,100); A=1e-13*A/opnorm(A,1);

Original vs new:

  884.357 μs (31 allocations: 1.07 MiB)
  741.872 μs (21 allocations: 705.69 KiB)

b) nA>2.1

A=randn(50,50);
A=100*A/opnorm(A,1);

Original vs new:

  657.075 μs (43 allocations: 393.97 KiB)
  614.814 μs (45 allocations: 355.81 KiB)

The most important improvement is case a) where the number of matrix-matrix products is reduced.

Solves JuliaLang/LinearAlgebra.jl#840.

jebej · 2021-05-01T22:28:47Z

Could you try a larger matrix size for (b)? As the size you tested (n=50) it looks like there are no improvements.

jarlebring · 2021-05-03T06:38:02Z

For case b, in a relative sense the difference is becoming smaller and smaller the bigger the matrix because the computation time is drowning in the matrix-matrix CPU-time. You can see slightly more if you "eliminate" the matrix matrix products.

function exp_test_new(A,A2,A4,A6)
    T=eltype(A);
    CC = T[64764752532480000.,32382376266240000.,7771770303897600.,
           1187353796428800.,  129060195264000.,  10559470521600.,
           670442572800.,      33522128640.,      1323241920.,
           40840800.,           960960.,           16380.,
           182.,                1.]
    Ut = CC[4]*A2
    Ut[diagind(Ut)] .+=  CC[2]
    U  = ((CC[14].*A6 .+ CC[12].*A4 .+ CC[10].*A2) .+
              CC[8].*A6 .+ CC[6].*A4 .+ Ut)
    Vt = CC[3]*A2
    Vt[diagind(Vt)] .+=  CC[1]
    V  =  (CC[13].*A6 .+ CC[11].*A4 .+ CC[9].*A2) .+
        CC[7].*A6 .+ CC[5].*A4 .+ Vt
end


function exp_test_org(A,A2,A4,A6)
    T=eltype(A);
    CC = T[64764752532480000.,32382376266240000.,7771770303897600.,
           1187353796428800.,  129060195264000.,  10559470521600.,
           670442572800.,      33522128640.,      1323241920.,
           40840800.,           960960.,           16380.,
           182.,                1.]

    n=size(A,1);
    Inn=Matrix{eltype(A)}(I,n,n);
    U  = ((CC[14].*A6 .+ CC[12].*A4 .+ CC[10].*A2) .+
              CC[8].*A6 .+ CC[6].*A4 .+ CC[4].*A2 .+ CC[2].*Inn)
    V  = (CC[13].*A6 .+ CC[11].*A4 .+ CC[9].*A2) .+
        CC[7].*A6 .+ CC[5].*A4 .+ CC[3].*A2 .+ CC[1].*Inn
end

We can do timing like this:

julia> n=100;
julia> A=randn(n,n); A2=A*A; A4=A2*A2; A6=A2*A4;
julia> @btime exp_test_org($A,$A2,$A4,$A6);
  165.635 μs (7 allocations: 234.80 KiB)
julia> @btime exp_test_new($A,$A2,$A4,$A6);
  148.215 μs (15 allocations: 314.91 KiB)
julia> n=500;
julia> A=randn(n,n); A2=A*A; A4=A2*A2; A6=A2*A4;
julia> @btime exp_test_org($A,$A2,$A4,$A6);
  4.209 ms (7 allocations: 5.72 MiB)
julia> @btime exp_test_new($A,$A2,$A4,$A6);
  3.813 ms (15 allocations: 7.64 MiB)
julia> n=2000;
julia> A=randn(n,n); A2=A*A; A4=A2*A2; A6=A2*A4;
julia> @btime exp_test_org($A,$A2,$A4,$A6);
  69.824 ms (7 allocations: 91.55 MiB)
julia> @btime exp_test_new($A,$A2,$A4,$A6);
  65.964 ms (15 allocations: 122.10 MiB)

New code seems better for this domain of n. For larger n it seems inconclusive. Case a is the important modification in this PR.

Edit: After incorpating mul! usage. Both a and b are better.

jarlebring · 2021-05-03T11:30:42Z

I think one can save one additional allocation in case b by replacing Vt with an inplace scale of A2

    Vt = A2; # Recycle the A2 memory 
    rmul!(Vt,CC[3])
    Vt[diagind(Vt)] .+=  CC[1]

and then precompute where A2 is used in V. The code does get messy. I'm not convinced it is worthwhile.

stdlib/LinearAlgebra/src/dense.jl

dkarrasch · 2021-05-04T12:43:17Z

BTW, you might get rid of the massive CI failure by rebasing onto current master. There is some SuiteSparse checksums stuff going on. I was seeing all those failures and immediately started searching for typos. 😄

dkarrasch

LGTM, a nice improvement. Shall we merge?

jarlebring · 2021-05-06T07:07:16Z

Should I squash the commits? I would be happy to do it except my usual procedure with rebase -i HEAD~XX causes some trouble. I get strange git errors originating somehow from SuiteSparse.

dkarrasch · 2021-05-06T07:24:23Z

I would have squashed it when merging, so there is no need to do that. As for the many other commits, I do the following procedure: first pull the current master into your local master, then checkout your branch, then

git pull --rebase origin master
git push --force

That will hopefully make all the intermediate commits go away.

Co-authored-by: Daniel Karrasch <daniel.karrasch@posteo.de>

jarlebring · 2021-05-06T07:56:04Z

Thanks. 👍 It seems the addition of the other commits spammed other PRs. Apologies. 🙈

…d mul! (JuliaLang#40668) Co-authored-by: Daniel Karrasch <daniel.karrasch@posteo.de>

jarlebring mentioned this pull request Apr 30, 2021

Julia expm too many mat-mats matrixfunctions/GraphMatFun.jl#33

Closed

dkarrasch added linear algebra Linear algebra performance Must go faster labels Apr 30, 2021

dkarrasch reviewed May 4, 2021

View reviewed changes

stdlib/LinearAlgebra/src/dense.jl Outdated Show resolved Hide resolved

jarlebring force-pushed the master branch from ce77a0d to 87fbeea Compare May 4, 2021 09:38

dkarrasch reviewed May 4, 2021

View reviewed changes

stdlib/LinearAlgebra/src/dense.jl Show resolved Hide resolved

stdlib/LinearAlgebra/src/dense.jl Outdated Show resolved Hide resolved

stdlib/LinearAlgebra/src/dense.jl Outdated Show resolved Hide resolved

stdlib/LinearAlgebra/src/dense.jl Outdated Show resolved Hide resolved

jarlebring changed the title ~~Efficiency improvement of exp(::StridedMatrix) with UniformScaling~~ Efficiency improvement of exp(::StridedMatrix) with UniformScaling and mul! May 4, 2021

dkarrasch reviewed May 4, 2021

View reviewed changes

stdlib/LinearAlgebra/src/dense.jl Outdated Show resolved Hide resolved

dkarrasch approved these changes May 6, 2021

View reviewed changes

jarlebring force-pushed the master branch from 0390c7f to fab34b0 Compare May 6, 2021 07:16

jarlebring and others added 5 commits May 6, 2021 09:54

Reduce nof matmat's in exp(StridedMatrix)

28d77af

exp(::StridedMatrix): use Uniformscaling

44b9eaa

exp!() In-place mul! for case nA<2.1

ae811e8

exp!() mul! for allocation economical also for nA > 2.1

e10d87f

exp!() Preserve original order in U update for nA>2.1

0c3c520

Co-authored-by: Daniel Karrasch <daniel.karrasch@posteo.de>

jarlebring force-pushed the master branch from fab34b0 to 0c3c520 Compare May 6, 2021 07:54

dkarrasch added merge me PR is reviewed. Merge when all tests are passing and removed merge me PR is reviewed. Merge when all tests are passing labels May 6, 2021

dkarrasch merged commit 15b5143 into JuliaLang:master May 6, 2021

jarlebring mentioned this pull request May 6, 2021

Five arg mul! for UniformScaling and improvement in exp! #40731

Merged

antoine-levitt pushed a commit to antoine-levitt/julia that referenced this pull request May 9, 2021

Efficiency improvement of exp(::StridedMatrix) with UniformScaling an…

0a2a2a9

…d mul! (JuliaLang#40668) Co-authored-by: Daniel Karrasch <daniel.karrasch@posteo.de>

dghosef pushed a commit to dghosef/julia that referenced this pull request May 11, 2021

Efficiency improvement of exp(::StridedMatrix) with UniformScaling an…

1fd97e4

…d mul! (JuliaLang#40668) Co-authored-by: Daniel Karrasch <daniel.karrasch@posteo.de>

shirodkara pushed a commit to shirodkara/julia that referenced this pull request Jun 9, 2021

Efficiency improvement of exp(::StridedMatrix) with UniformScaling an…

07122ec

…d mul! (JuliaLang#40668) Co-authored-by: Daniel Karrasch <daniel.karrasch@posteo.de>

johanmon pushed a commit to johanmon/julia that referenced this pull request Jul 5, 2021

Efficiency improvement of exp(::StridedMatrix) with UniformScaling an…

7b6259e

…d mul! (JuliaLang#40668) Co-authored-by: Daniel Karrasch <daniel.karrasch@posteo.de>

jarlebring mentioned this pull request Aug 12, 2021

One multiplication too many in _exp! SciML/ExponentialUtilities.jl#63

Closed

LilithHafner mentioned this pull request Oct 23, 2024

Correction of lcm() for Array arguments #56113

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficiency improvement of exp(::StridedMatrix) with UniformScaling and mul! #40668

Efficiency improvement of exp(::StridedMatrix) with UniformScaling and mul! #40668

jarlebring commented Apr 30, 2021 •

edited

Loading

jebej commented May 1, 2021

jarlebring commented May 3, 2021 •

edited

Loading

jarlebring commented May 3, 2021 •

edited

Loading

dkarrasch commented May 4, 2021

dkarrasch left a comment

jarlebring commented May 6, 2021

dkarrasch commented May 6, 2021

jarlebring commented May 6, 2021 •

edited

Loading

Efficiency improvement of exp(::StridedMatrix) with UniformScaling and mul! #40668

Efficiency improvement of exp(::StridedMatrix) with UniformScaling and mul! #40668

Conversation

jarlebring commented Apr 30, 2021 • edited Loading

jebej commented May 1, 2021

jarlebring commented May 3, 2021 • edited Loading

jarlebring commented May 3, 2021 • edited Loading

dkarrasch commented May 4, 2021

dkarrasch left a comment

Choose a reason for hiding this comment

jarlebring commented May 6, 2021

dkarrasch commented May 6, 2021

jarlebring commented May 6, 2021 • edited Loading

jarlebring commented Apr 30, 2021 •

edited

Loading

jarlebring commented May 3, 2021 •

edited

Loading

jarlebring commented May 3, 2021 •

edited

Loading

jarlebring commented May 6, 2021 •

edited

Loading